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Abstract 

In this note, we address the theoretical properties of Ap, a class of compressed 
sensing decoders that rely on ff' minimization with < p < 1 to recover estimates 
of sparse and compressible signals from incomplete and inaccurate measurements. In 
particular, we extend the results of Candes, Romberg and Tao [4] and Wojtaszczyk 
[30] regarding the decoder Ai, based on minimization, to Ap with < p < 1. Our 
results are two- fold. First, we show that under certain sufficient conditions that are 
weaker than the analogous sufficient conditions for Ai the decoders Ap are robust 
to noise and stable in the sense that they are (2,p) instance optimal for a large class 
of encoders. Second, we extend the results of Wojtaszczyk to show that, like Ai, the 
decoders Ap are (2, 2) instance optimal in probability provided the measurement 
matrix is drawn from an appropriate distribution. 



1 Introduction 

The sparse recovery problem received a lot of attention lately, both because of 
its role in transform coding with redundant dictionaries (e.g., [9,28,29]), and 
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perhaps more importantly because it inspired compressed sensing [3,4,13], a 
novel method of acquiring signals with certain properties more efficiently com- 
pared to the classical approach based on Nyquist- Shannon sampling theory. 
Define Eg to be the set of all S -sparse vectors, i.e., 

S^:={xeM^: |supp(x)| < ^}, 

and define compressible vectors as vectors that can be well approximated in 
Tig. Let <Js{x)£p denote the best S-term approximation error of x in 
(quasi-)norm where p > 0, i.e., 

0's{x)ep := min llx — v\\„. 

Throughout the text, A denotes an M x real matrix where M < N. Let the 
associated encoder be the map x ^ Ax (also denoted by A). The transform 
coding and compressed sensing problems mentioned above require the exis- 
tence of decoders, say A : M*^ i— > M^, with roughly the following properties: 



(CI) A{Ax) = X whenever x G with sufficiently small S. 

(C2) — A(y4x + e)|| < ||e|| +as{x)£P, where the norms are appropriately cho- 
sen. Here e denotes measurement error, e.g., thermal and computational 
noise. 

(C3) A{Ax) can be computed efficiently (in some sense). 

Below, we denote the (in general noisy) encoding of x by b, i.e., 

b = Ax + e. (1) 

In general, the problem of constructing decoders with properties (C1)-(C3) is 
non-trivial (even in the noise- free case) as A is overcomplete, i.e., the linear 
system of M equations in ([T]) is underdetermined, and thus, if consistent, it 
admits infinitely many solutions. In order for a decoder to satisfy (C1)-(C3), 
it must choose the "correct solution" among these infinitely many solutions. 
Under the assumption that the original signal x is sparse, one can phrase the 
problem of finding the desired solution as an optimization problem where the 
objective is to maximize an appropriate "measure of sparsity" while simulta- 
neously satisfying the constraints defined by ([T]). In the noise-free case, i.e., 
when e = in ([1]), under certain conditions on the M x N matrix A, i.e., if A 
is in general position, there is a decoder Aq which satisfies Ao{Ax) = x for all 
X G whenever S < M/2, e.g., see [14]. This Aq can be explicitly computed 
via the optimization problem 

Ao(6) := argmin \\y\\o subject to 6 = Ay. (2) 
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Here \\y\\o denotes the number of non-zero entries of the vector y, equivalently 
its so-called £°-norm. Clearly, the sparsity of y is reflected by its £''-norm. 

1.1 Decoding by £^ minimization 

As mentioned above, Aq{Ax) = x exactly if x is sufficiently sparse depending 
on the matrix A. However, the associated optimization problem is combina- 
torial in nature, thus its complexity grows quickly as becomes much larger 
than M. Naturally, one then seeks to modify the optimization problem so 
that it lends itself to solution methods that are more tractable than combi- 
natorial search. In fact, in the noise- free setting, the decoder defined by £^ 
minimization, given by 

Ai(6) := argmin \\y\\i subject to Ay = b, (3) 

recovers x exactly if x is sufficiently sparse and the matrix A has certain 
properties (e.g., [4,6,9,14,15,26]). In particular, it has been shown in [4] 
that if a; G and A satisfies a certain restricted isometry property, e.g., 
^35 < 1/3 or more generally 5(k+i)s < for some k > 1 such that k G ^N, 
then Ai{Ax) = x (in what follows, N denotes the set of positive integers, i.e., 
^ N). Here 6s are the S -restricted isometry constants of A, as introduced 
by Candes, Romberg and Tao (see, e.g., [4]), defined as the smallest constants 
satisfying 

{l-6s)\\c\\l<\\Ac\\l<{l + 6s)\\c\\l (4) 
for every c G S^. Throughout the paper, using the notation of [30], we say 
that a matrix satisfies RIP (5, 6) if 6s < 6. 

Checking whether a given matrix satisfies a certain RIP is computationally 
intensive, and becomes rapidly intractable as the size of the matrix increases. 
On the other hand, there are certain classes of random matrices which have 
favorable RIP. In fact, let A be an M x A^ matrix the columns of which 
are independent, identically distributed (i.i.d.) random vectors with any sub- 
Gaussian distribution. It has been shown that A satisfies RIP {S, 6) with any 
< 5 < 1 when 

S < CiM/log{N/M), (5) 

with probability greater than 1 — 26"'^^*'^ (see, e.g., [1], [5], [6]), where ci and 
02 are positive constants that only depend on 6 and on the actual distribution 
from which A is drawn. 

In addition to recovering sparse vectors from error-free observations, it is im- 
portant that the decoder be robust to noise and stable with regards to the 
"compressibility" of x. In other words, we require that the reconstruction error 
scale well with the measurement error and with the "non-sparsity" of the sig- 
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nal (i.e., (C2) above). For matrices that satisfy RIP((fc + 1)5', 5), with 5 < l^^j 
for some k > 1 such that k e ^N, it has been shown in [4] that there exists a 
feasible decoder for which the approximation error ||A^(6) — x||2 scales lin- 
early with the measurement error ||e||2 < e and with as{x)ei. More specifically, 
define the decoder 

Al{b) = argmin \\y\\i subject to \\Ay — b\\2 < e. (6) 

The following theorem of Candes et al. in [4] provides error guarantees when 
X is not sparse and when the observation is noisy. 

Theorem 1.1 [4] Fix e > 0, suppose that x is arbitrary, and let b = Ax + e 

where \\e\\2 < e. If A satisfies 63s + ^^as < 2, then 

\\Al{b)-xh<C,,se + C2,s^^^. (7) 



For reasonable values 0/643, the constants are well behaved; e.g., Ci^s = 12.04 
and 6*2,5 = 8.77 for 64s = 1/5. 

Remark 1.2 This means that given b = Ax + e, and x is sufficiently sparse, 
A\{b) recovers the underlying sparse signal within the noise level. Conse- 
quently the recovery is perfect if e = 0. 

Remark 1.3 By explicitly assuming x to be sparse, Candes et. al. [4] proved 
a version of the above result with smaller constants, i.e., for b = Ax + e with 

X G and ||e||2 < e, 

||A^i(6)-a:||2<C5e, (8) 

where C5 < Ci ^. 

Remark 1.4 Recently, Candes [2] showed that 823 < — 1 is sufficient to 
guarantee robust and stable recovery in the sense of ([7]) with slightly better 
constants. 

In the noise free case, i.e., when e = 0, the reconstruction error in Theorem ll.il 
is bounded above by cr5'(x)^i/v^, see (I7j). This upper bound would sharpen 
if one could replace as{x)ii /^/S with a3{x)i2 on the right hand side of ([7]) 
(note that as{x)£i can be large even if all the entries of the reconstruction 
error are small but nonzero; this follows from the fact that for any vector 
y e M^, II2/II2 < \\y\\i < V^lll/lh, and consequently there are vectors x E 
for which as{x)ii/\fS 3> as{x)p, especially when is large). In [10] it was 
shown that the term C2,sO'3{x)ii /\^ on the right hand side of cannot be 
replaced with Cas{x)i2 if one seeks the inequality to hold for all x G with 
a fixed matrix A, unless M > cN for some constant c. This is unsatisfactory 
since the paradigm of compressed sensing relies on the ability of recovering 
sparse or compressible vectors x from significantly fewer measurements than 
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the ambient dimension A^. 



Even though one cannot obtain bounds on the approximation error in terms 
of as{x)i2 with constants that are uniform on x (with a fixed matrix A), 
the situation is significantly better if we relax the uniformity requirement and 
seek for a version of ([7]) that holds "with high probability" . Indeed, it has been 
recently shown by Wojtaszczyk that for any specific x, as{x)p can be placed 
in ([7]) in lieu of as{x)ii / \fS (with different constants that are still independent 
of x) with high probability on the draw of A if (i) M > cS log N and (ii) the 
entries A is drawn i.i.d. from a Gaussian distribution or the columns of A are 
drawn i.i.d. from the uniform distribution on the unit sphere in M.'^^ [30]. In 
other words, the encoder Ai = is "(2,2) instance optimal in probability" 
for encoders associated with such A, a property which was discussed in [10]. 

Following the notation of [30], we say that an encoder- decoder pair {A, A) is 
(g, p) instance optimal of order S with constant C if 

W^iA'^) - -II. 2 C§70^ (9) 

holds for all x G M^. Moreover, for random matrices A^, {A^,A) is said 
to be {q,p) instance optimal in probability if for any x iQ holds with high 
probability on the draw of A^^. Note that with this notation Theorem 11.11 
implies that {A,Ai) is (2,1) instance optimal (set e = 0), provided A satisfies 
the conditions of the theorem. 

The preceding discussion makes it clear that Ai satisfies conditions (CI) and 
(C2), at least when A is a sub-Gaussian random matrix and 5* is sufficiently 
small. It only remains to note that decoding by Ai amounts to solving an £^ 
minimization problem, and is thus tractable, i.e., we also have (C3). In fact, 
minimization problems as described above can be solved efficiently with solvers 
specifically designed for the sparse recovery scenarios (e.g. [27], [16], [11]). 



1.2 Decoding by minimization 



We have so far seen that with appropriate encoders, the decoders A^ provide 
robust and stable recovery for compressible signals even when the measure- 
ments are noisy [4], and that (A^, Ai) is (2,2) instance optimal in probabil- 
ity [30] when is an appropriate random matrix. In particular, stability 
and robustness properties are conditioned on an appropriate RIP while the 
instance optimality property is dependent on the draw of the encoder ma- 
trix (which is typically called the measurement matrix) from an appropriate 
distribution, in addition to RIP. 
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Recall that the decoders Ai and were devised because their action can be 
computed by solving convex approximations to the combinatorial optimization 
problem ([2]) that is required to compute Aq. The decoders defined by 



with < p < 1 are also approximations of Aq, the actions of which are 
computed via non-convex optimization problems that can be solved, at least 
locally, still much faster than ([2]). It is natural to ask whether the decoders 
Ap and A^ possess robustness, stability, and instance optimality properties 
similar to those of Ai and A^, and whether these are obtained under weaker 
conditions on the measurement matrices than the analogous ones with p = 1. 

Early work by Gribonval and co-authors [19-22] take some initial steps in 
answering these questions. In particular, they devise metrics that lead to suf- 
ficient conditions for uniqueness of Ai(6) to imply uniqueness of Ap{b) and 
specifically for having Ap{b) = Ai(6) = x. The authors also present stability 
conditions in terms of various norms that bound the error, and they conclude 
that the smaller the value of p is, the more non-zero entries can be recovered 
by (fTTI) . These conditions, however, are hard to check explicitly and no class 
of deterministic or random matrices was shown to satisfy them at least with 
high probability. On the other hand, the authors provide lower bounds for 
their metrics in terms of generalized mutual coherence. Still, these conditions 
are pessimistic in the sense that they generally guarantee recovery of only very 
sparse vectors. 

Recently, Chartrand showed that in the noise-free setting, a sufficiently sparse 
signal can be recovered perfectly with Ap, where < p < 1, under less restric- 
tive RIP requirements than those needed to guarantee perfect recovery with 
Ai. The following theorem was proved in [7]. 

Theorem 1.5 [7] Let < p < 1, and let 5 G N. Suppose that x is S-sparse, 

--1 -—1 

and set b = Ax. If A satisfies 6ks + kp d(^k+i)s < kp — 1 for some k > 1 
such that k G ^N, then Ap(6) = x. 

Note that, for example, when p = 0.5 and k = 3, the above theorem only 
requires 63s + 27^45 < 26 to guarantee perfect recovery with Aq.s, a less re- 
strictive condition than the analogous one needed to guarantee perfect recon- 
struction with Ai, i.e., 535 + 3545 < 2. Moreover, in [8], Staneva and Chartrand 
study a modified RIP that is defined by replacing ||v4c||2 in (jlj) with ||v4c||p. 
They show that under this new definition of 6s, the same sufficient condition 
as in Theorem 11.51 guarantees perfect recovery. Steneva and Chartrand also 
show that if A is an M X Gaussian matrix, their sufficient condition is sat- 
isfied provided M > Ci{p)S + pC2{p)S\og{N/ S), where Ci{p) and C2{p) are 



Ap(6) := argmin \\y\\p s.t. \\Ay — b\\2 < e, and 
Ap(6) := argmin \\y\\p s.t. Ay = b, 



(10) 
(11) 
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given explicitly in [8]. It is important to note is that pC2{p) goes to zero as p 
goes to zero. In other words, the dependence on TV of the required number of 
measurements M (that guarantees perfect recovery for all x G ) disappears 
as p approaches 0. This result motivates a more detailed study to understand 
the properties of the decoders Ap in terms of stability and robustness, which 
is the objective of this paper. 

1.2.1 Algorithmic Issues 

Clearly, recovery by minimization poses a non-convex optimization problem 
with many local minimizers. It is encouraging that simulation results from 
recent papers, e.g., [7,25], strongly indicate that simple modifications to known 
approaches like iterated reweighted least squares algorithms and projected 
gradient algorithms yield x* that are the global minimizers of the associated 
minimization problem (or approximate the global optimizers very well). It 
is also encouraging to note that even though the results presented in this work 
and in others [7,19-22,25] assume that the global minimizer has been found, 
a significant set of these results, including all results in this paper, continue to 
hold if we could obtain a feasible point x* which satisfies ||5;*||p < ll^^llp (where 
X is the vector to be recovered). Nevertheless, it should be stated that to our 
knowledge, the modified algorithms mentioned above have only been shown 
to converge to local minima. 



1.3 Paper Outline 

In what follows, we present generalizations of the above results, giving sta- 
bility and robustness guarantees for minimization. In Section 12.11 we show 
that the decoders Ap and are robust to noise and (2,p) instance optimal 
in the case of appropriate measurement matrices. For this section we rely and 
expand on our note [25] . In Section 12.31 we extend [30] and show that for 
the same range of dimensions as for decoding by minimization, i.e., when 
G M^'^x^ with M > cS\og{N), (A<^, Ap) is also (2,2) instance optimal in 
probability for < p < 1, provided the measurement matrix A^ is drawn 
from an appropriate distribution. The generalization follows the proof of Wo- 
jtaszczyk in [30]; however it is non-trivial and requires a variant of a result by 
Gordon and Kalton [18] on the Banach-Mazur distance between a p-convex 
body and its convex hull. In Section [3] we present some numerical results, fur- 
ther illustrating the possible benefits of using minimization and highlighting 
the behavior of the Ap decoder in terms of stability and robustness. Finally, 
in Section H] we present the proofs of the main theorems and corollaries. 

While writing this paper, we became aware of the work of Foucart and Lai [17] 
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which also shows similar (2,p) instance optimality results for < p < 1 under 
different sufficient conditions. In essence, one could use the (2,p)-results of 
Foucart and Lai to obtain (2, 2) instance optimality in probability results 
similar to the ones we present in this paper, albeit with different constants. 
Since neither the sufficient conditions for (2,p) instance optimality presented 
in [f7] nor the ones in this paper are uniformly weaker, and since neither 
provide uniformly better constants, we simply use our estimates throughout. 



2 Main Results 



In this section, we present our theoretical results on the ability of l'^ minimiza- 
tion to recover sparse and compressible signals in the presence of noise. 



2.1 Sparse recovery with Ap.- stability and robustness 



We begin with a deterministic stability and robustness theorem for decoders 
Ap and Ap when < p < 1 that generalizes Theorem 11.11 of Candes et al. 
Note the associated sufficient conditions on the measurement matrix, given in 
f|T2l) below, are weaker for smaller values of p than those that correspond to 
p = 1. The results in this subsection were initially reported, in part, in [25]. 

In what follows, we say that a matrix A satisfies the property P{k, S,p) if it 
satisfies 

6ks + fc^"'5(fc+i)5 < ki^' - 1, (12) 
for S* G N and k > 1 such that k G 

Theorem 2.1 (General Case) Let < p < 1. Suppose that x is arbitrary 
and h = Ax + e where ||e||2 < e. If A satisfies P{k, S,p), then 

mb)~xr,<C^e^ + C,^0^, (13) 

where 

Ci = 2 \^ ,^ ^^;f '\ ,^ ^ , and (14) 

2(^)^/^ / ((2/p - l)f + fcp/^-i)(l + 6,sr/A 

Remark 2.2 By setting p = 1 and /c = 3 in Theorem 12. H we obtain Theorem 
II. H with precisely the same constants. 
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Remark 2.3 The constants in Theorem 12.11 are generally well behaved; e.g., 
Ci = 5.31 and C2 = 4.31 for 643 = 0.5 and p = 0.5. Note for ^45 = 0.5 the 
sufficient condition (IT^ is not satisfied when p = 1, and thus Theorem 12.11 
does not yield any upper bounds on ||Ai(6) — x\\2 in terms of as{x)ii. 

Corollary 2.4 ((2,p) instance optimality) Let < p < 1. Suppose that 
A satisfies P{k,S,p). Then {A,Ap) is (2,p) instance optimal of order S with 
constant C^^ where C2 is as in f|T5l) . 

Corollary 2.5 (sparse case) LetO < p < 1. Suppose x G andb = Ax+e 
where \\e\\2 < e. If A satisfies P{k,S,p), then 

\\A;ib) - x\\2 < iC^f^ e, 

where Ci is as in ^ 



Remark 2.6 Corollaries 12.41 and 12.51 follow from Theorem 12. II by setting e = 
and as{x)^v = 0, respectively. Furthermore, Corollary 12.51 can be proved 
independently of Theorem 12.11 leading to smaller constants. See [25] for the 
explicit values of these improved constants. Finally, note that setting e = in 
Corollary 12.51 we obtain Theorem 11.51 ClS Sb corollary. 

Remark 2.7 In [17], Foucart and Lai give different sufficient conditions for 
exact recovery than those we present. In particular, they show that if 

. , 4(v^- l)(m/2)i/P~V2 

Sms < g{m) := ^= ' ■ 16 

^ 4(^2- l)(m/2)i/p-i/2 + 2 ^ ' 

holds for some m > 2,m G ^N, then Ap will recover signals in exactly. 
Note that the sufficient condition in this paper, i.e., fll2l) . holds when 

5ms < f{m) := ) tLzzt-T (17) 

[m — lyip ^ + 1 

for some m > 2, m G ^N. In Figure [T|, we compare these different sufficient 
conditions as a function of m for p = 0.1,0.5, and 0.9 respectively. Figured] 
indicates that neither sufficient condition is weaker than the other for all values 
of m. In fact, we can deduce that flTBl) is weaker when m is close to 2, while 
(fT71) is weaker when m starts to grow larger. Since both conditions are only 
sufficient, if either one of them holds for an appropriate m, then Ap recovers 
all signals in S;^ . 

Remark 2.8 In [12], Davies and Gribonval showed that if one chooses 623 > 
6{p) (where 6{p) can be computed implicitly for < p < 1), then there exist 
matrices (matrices in ]R(^~i)^^ that correspond to tight Parseval frames in 
]R^"i) with the prescribed 623 for which Ap fails to recover signals in . 
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Fig. 1. A comparison of the sufficient conditions on 5mS in (|17p and (jl6p as a function 
of m, for p = 0.1 (top), p = 0.5 (center) and p = 0.9 (bottom). 



Note that this result does not contradict with the results that we present in 
this paper: we provide sufficient conditions (e.g., (fT2|) ) in terms of 6(^k+i)s, 
where k > 1 and kS G N, that guarantee recovery by Ap. These conditions 
are weaker than the corresponding conditions ensuring recovery by Ai, which 
suggests that using Ap can be beneficial. Moreover, the numerical examples 
we provide in Section [3] indicate that by using Ap, < p < 1, one can indeed 
recover signals in S;^, even when Ai fails to recover them (see Figure [2]). 

Remark 2.9 In summary. Theorem 12.11 states that if f|T2l) is satisfied then 
we can recover signals in S;^ stably by decoding with A^. It is worth men- 
tioning that the sufficient conditions presented here reduce the gap between 
the conditions for exact recovery with Aq (i.e., < 1) ^-nd with Ai, e.g., 
^35 < 1/3. For example for /c = 2 and p = 0.5, 63s < 7/9 is sufficient. In the 
next subsection, we quantify this improvement. 



2.2 The relationship between Si and Sj 



Let A be an M X matrix and suppose 6^, fn G {1, . . . , [M/2J} are its m- 
restricted isometry constants. Define Sp for A with < p < 1 as the largest 
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value of G N for which the slightly stronger version of (fT2!) given by 



^-1 



kp~^ + 1 



holds for some > 1, /c e ^N. Consequently, by Theorem 12.11 Ap(Ax) = x 
for all X G . We now estabhsh a relationship between 5*1 and Sp. 



Proposition 2.10 Suppose, in the above described setting, there exists Si G N 
and k > 1, k & such that 



k-1 
k+1 



(19) 



Then Ai recovers all Si-sparse vectors, and Ap recovers all Sp sparse vectors 
with 

_ k + 1 

On — ~p~ Ol 

k^-p + 1 



Remark 2.11 For example, if 65S1 < 3/5 then using A 2, we can recover all 
S'2-sparse vectors with S2 = [^Si\. 



2.3 Instance optimality in probability and Ap 



In this section, we show that (A^, Ap) is (2, 2) instance optimal in probabihty 
when is an appropriate random matrix. Our approach is based on that 
of [30], which we summarize now. A matrix A is said to possess the LQi{a) 
property if and only if 

A(Sf ) D aSf , 

where denotes the i'^ unit ball in M". In [30], Wojtaszczyk shows that 
random Gaussian matrices of size M x as well as matrices whose columns 
are drawn uniformly from the sphere possess, with high probability, the LQ^{a) 

property with a = fJ'\/ ^°^^^J —- Noting that such matrices also satisfy RlP((/c+ 
1)5", S) with S < Cj^^^^jj^, again with high probability, Wojtaszczyk proves 
that Ai, for these matrices, is (2,2) instance optimal in probability of order 
S. Our strategy for generalizing this result to Ap with < p < 1 relies on 
a generalization of the LQ^ property to an LQ^ property. Specifically, we say 
that a matrix A satisfies LQp(a) if and only if 
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We first show that a random matrix A^, either Gaussian or uniform as men- 
tioned above, satisfies the LQp{a) property with 



log(iV/M)V'^''"'^'^ 



C{v) \ M ) 

Once we estabhsh this property, the proof of instance optimahty in probability 
for Ap proceeds largely unchanged from Wojtaszczyk's proof with modifica- 
tions to account only for the non-convexity of the £^-quasinorm with < p < 1. 

Next, we present our results on instance optimality of the Ap decoder, while 
deferring the proofs to Section |H Throughout the rest of the paper, we focus 
on two classes of random matrices: A^^ denotes M x N matrices, the entries 
of which are drawn from a zero mean, normalized column-variance Gaussian 
distribution, i.e., = (aij) where atj ~ A/'(0, 1/v^M); in this case, we say 
that Ai_j is an M X N Gaussian random matrix. A^^, on the other hand, denotes 
M X N matrices, the columns of which are drawn uniformly from the sphere; 
in this case we say that A^^ is an M x iV uniform random matrix. In each case, 
{fl, P) denotes the associated probability space. 

We start with a lemma (which generalizes an analogous result of [30]) that 
shows that the matrices A^^ and A^^ satisfy the LQ^ property with high prob- 
ability. 

Lemma 2.12 Let < p < 1, and let A^ he an M x N Gaussian random 
matrix. ForO < jj, < suppose that KiM{\ogM)^ ^ N < e^^'^^ for some 

^ > {1 — 2fi^)"^ and some constants Ki,K2 > 0. Then, there exists a constant 
c = c{fi,^, Ki, K2) > 0, independent ofp, M, and N, and a set 



such that P{^fj.) > I — e 



-cM 



In other words, satisfies the LQp(a), a = 1/C{p) {ji^ {n/m) ^ Ip I ^ ^^^^ 
probability > 1 — e"'^^^ on the draw of the matrix. Here C{p) is a positive 
constant that depends only on p. (In particular, C(l) = 1 and see (!50l) for the 
explicit value of C{p) when {) < p < 1). This statement is true also for A^. 

The above lemma for p = 1 can be found in [30]. As we will see in SectionUl the 
generalization of this result to < p < 1 is non-trivial and requires a result 
from [18], cf. [23], relating certain "distances" of p-convex bodies to their 
convex hulls. It is important to note that this lemma provides the machinery 
needed to prove the following theorem, which extends to Ap, < p < 1, the 
analogous result of Wojtaszczyk [30] for Ai. 
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In what follows, for a set T C {1, ... , N}, T= := {1, . . . , A^} \ T; for y E M^, 
Ut denotes the vector with entries yrij) = y{j) for all j G T, and yriJ) = 
for j eT'^. 

Theorem 2.13 Let Q < p < I. Suppose that A E R^^^^ satisfies RIP(S',5) 
and LQp (^^(/^V -5) 1/^-1/2) for some fi > and C{p) as in (1501) . Let A be an 
arbitrary decoder. If {A, A) is (2,p) instance optimal of order S with constant 
C2,p, then for any x E and e E M^'^, all of the following hold. 

(t) mAx + e)-xh<C{\\eh + j^) 
(ii) II A(Ax) - x||2 < C(||Axt,= ||2 + (ys{x),2) 
(ill) \\A{Ax + e) - x\\2 < C(||e||2 + as{x)p + WAxr^h) 

Above, To denotes the set of indices of the largest (in magnitude) S coefficients 
of x; the constants (all denoted by C) depend on 5, fi, p, and C2,p but not on 
M and N. For the explicit values of these constants see (1551) and fl5^ . 

Finally, our main theorem on the instance optimality in probability of the Ap 
decoder follows. 

Theorem 2.14 Let < p < 1, and let A^^ be an M x N Gaussian random 
matrix. Suppose that N > M[log(M)]^. There exists constants ci, 02,03 > 
such that for all S eN with S < CiM / log (N/M), the following are true. 

(i) There exists Qi with P{Qi) > 1 — 3e~'^^^'^ such that for all uj E Qi 

+ e)-xh< C{\\eh + |^), (20) 

for any x E and for any e E M^^. 
(a) For any x E M^, there exists Qx with P{flx) > 1 — 4e~^^*^ such that for 
all uj E 

\\Ap{A^{x) + e) - x||2 < C (||e||2 + (Ts{x)i2) , (21) 
for any e E M^^. 

The statement also holds for A^, i.e., for random matrices the columns of 
which are drawn independently from a uniform distribution on the sphere. 

Remark 2.15 The constants above (both denoted by C) depend on the pa- 
rameters of the particular LQ^ and RIP properties that the matrix satisfies, 
and are given explicitly in Section HJ see ( l38l) and (HTi) . The constants ci,C2, 
and C3 depend only on p and the distribution of the underlying random matrix 
(see the proof in Section 14.51) and are independent of M and A^. 
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Remark 2.16 Clearly, the statements do not make sense if the hypothesis of 
the theorem forces S to be 0. In turn, for a given (M, A^) pair, it is possible that 
there is no positive integer S for which the conclusions of Theorem 12.141 hold. 
In particular, to get a non-trivial statement, one needs M > ^log(A^/M). 

Remark 2.17 Note the difference in the order of the quantifiers between 
conclusions (i) and (ii) of Theorem 12. 14[ Specifically, with statement (i), once 
the matrix is drawn from the "good" set fli, we obtain the error guarantee 
fl20l) for every x and e. In other words, after the initial draw of a good matrix 
A, stability and robustness in the sense of fl2U]) are ensured. On the other 
hand, statement (ii) concludes that associated with every x is a "good" set 
Qx (possibly different for different x) such that if the matrix is drawn from 
Qx, then stability and robustness in the sense of (12T|) are guaranteed. Thus, 
in (ii), for every x, a different matrix is drawn, and with high probability on 
that draw (!2T|) holds. 

Remark 2.18 The above theorem pertains to the decoders Ap which, like the 
analogous theorem for Ai presented in [30], requires no knowledge of the noise 
level. In other words, provides estimates of sparse and compressible signals 
from limited and noisy observations without having to explicitly account for 
the noise in the decoding. This provides an improvement on Theorem 12.11 and 
a practical advantage when estimates of measurement noise levels are absent. 



3 Numerical Experiments 

In this section, we present some numerical experiments to highlight impor- 
tant aspects of sparse reconstruction by decoding using Ap, < p < 1. First, 
we compare the sufficient conditions under which decoding with Ap guaran- 
tees perfect recovery of signals in for different values of p and S. Next, 
we present numerical results illustrating the robustness and instance optimal- 
ity of the Ap decoder. Here, we wish to observe the linear growth of the i"^ 
reconstruction error || Ap(ylx + e) — x\\2, as a function of as{x)£2 and of ||e||2. 

To that end, we generate a 100 x 300 matrix A whose columns are drawn 
from a Gaussian distribution and we estimate its RIP constants 6s via Monte 
Carlo (MC) simulations. Under the assumption that the estimated constants 
are the correct ones (while in fact they are only lower bounds). Figure [2] (left) 
shows the regions where (I12p guarantees recovery for different (5, p)-pairs. 
On the other hand. Figure [2] (right) shows the empirical recovery rates via 
quasinorm minimization: To obtain this figure, for every S = 1,...,49, 
we chose 50 different instances of x G where non-zero coefficients of 

each were drawn i.i.d. from the standard Gaussian distribution. These vectors 
were encoded using the same measurement matrix A as above. Since there is 



14 



no known algorithm that will yield the global minimizer of the optimization 
problem (fTTI) . we approximated the action of Ap by using a projected gradient 
algorithm on a sequence of smoothed versions of the minimization problem: 

In f ill I) , instead of minimizing the \\y\\p, we minimized (j^iiVi + ^'^Y^'^^ 
initially with a large e. We then used the corresponding solution as the starting 
point of the next subproblem obtained by decreasing the value of e according to 
the rule e„ = (0.99)e„_i. We continued reducing the value of e and solving the 
corresponding subproblem until e becomes very small. Note that this approach 
is similar to the one described in [7]. The empirical results show that Ap (in 
fact, the approximation of Aj, as described above) is successful in a wider range 
of scenarios than those predicted by Theorem 12.11 This can be attributed to 
the fact that the conditions presented in this paper are only sufficient, or to 
the fact that in practice what is observed is not necessarily a manifestation of 
uniform recovery. Rather, the practical results could be interpreted as success 
of Ap with high probability on either x or A. 



Region where recovery with is "guaranteed" for p and S 
(Light Shading = Recoverable) 



Empirical Recovery Rates with A 





Fig. 2. For a Gaussian matrix A E jgiooxsoo^ whose 6s values are estimated via MC 
simulations, we generate the theoretical (left) and practical (right) phase-diagrams 
for reconstruction via i'^ minimization. 



Next, we generate scenarios that allude to the conclusions of Theorem 12. 141 To 
that end, we generate a signal composed of xt G S|q°, supported on an index 
set T, and a signal supported on T'^, where all the coefficients are drawn 
from the standard Gaussian distribution. We then normalize xt and so 
that ||xj'||2 = ||2;t<:||2 = 1 and generate x = xt + Xzt'^ with increasing values 
of A (starting from 0), thereby increasing (T4o(x)^2 A. For this experiment, 
we choose our measurement matrix A G Mioo^^oo drawing its columns 
uniformly from the sphere. For each value of A we measure the reconstruction 
error ||Ap(y4x) — x\\2, and we repeat the process 10 times while randomizing 
the index set T but preserving the coefficient values. We report the averaged 
results in Figure [3] (left) for different values of p. Similarly, we generate noisy 
observations Axt + Ae, of a sparse signal xt G S|q° where ||a:T||2 = ||e||2 = 1 
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Fig. 3. Reconstruction error with compressible signals (left), noisy observations 
(right). Observe the almost linear growth of the error in compressible signals and for 
different values of p, highlighting the instance optimality of the decoders. The plots 
were generated by averaging the results of 10 experiments with the same matrix A 
and randomized locations of the coefficients of x. 



and we increase the noise level starting from A = 0. Here, again, the non- 
zero entries of xt and all entries of e were chosen i.i.d. from the standard 
Gaussian distribution and then the vectors were properly normalized. Next, 
we measure ||Ap(AxT + Ae) — x^lh (for 10 realizations where we randomize T) 
and report the averaged results in Figure [3] (right) for different values of p. In 
both these experiments, we observe that the error increases roughly linearly as 
we increase A, i.e., aio{x)i2 and the noise power, respectively. Moreover, when 
the signal is highly compressible or when the noise level is low, we observe that 
reconstruction using Ap with < p < 1 yields a lower approximation error 
than that with p = 1. It is also worth noting that for values of p close to one, 
even in the case of sparse signals with no noise, the average reconstruction error 
is non-zero. This may be due to the fact that for such large p the number of 
measurements is not sufficient for the recovery of signals with S = AO, further 
highlighting the benefits of using the decoder Ap, with smaller values of p. 



Finally, in Figure HI we plot the results of an experiment in which we generate 
signals x G M^°° with sorted coefficients x{j) that decay according to some 
power law. In particular, for various values of < g < 1, we set x{j) = cj~^^'^ 
such that ||a;||2 = 1. We then encode x with 50 different 100 x 200 measurement 
matrices the columns of which were drawn from the uniform distribution on 
the sphere, and examine the approximations obtained by decoding with Ap 
for different values of < p < 1. The results indicate that values of p ~ g 
provide the lowest reconstruction errors. Note that in Figure HI we report the 
results in form of signal to noise ratios defined as 



SNR = 201ogio 



\X\\2 



\A{Ax) - x\\2 
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Performance of A on Compressible Signals 



Performance of A on Compressible Signals 
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Fig. 4. Reconstruction signal to noise ratios (in dB) obtained by using Ap 
to recover signals whose sorted coefficients decay according to a power law 
{x{j) = cj~^^'^, \\x\\2 = 1) as a function of q (left) and as a function of p (right). The 
presented results are averages of 50 experiments performed with different matrices 
in 1^100x200 Qbggj-ve that for highly compressible signals, e.g., for q = 0.4, there is 
a 5 dB gain in using p < 0.6 as compared to p = 1. The performance advantage is 
about 2 dB for q = 0.6. As the signals become much less compressible, i.e., as we 
increase q to 0.9 the performances are almost identical. 

4 Proofs 



4-.1 Proof of Proposition [2.101 



First, note that for any A G M*^^^, 6m. is non-decreasing in m. Also, the map 
k I— > 1^ is increasing in k for k >0. 

Set 

L:={k + l)Si, I=k^, and^p-— ^ 



' + 1 
Then 

~2-p 

k-1 _£~ -1 

We now describe how to choose i and Sp such that £ > £, S'p G N, and 
(£ + 1)5"^ = L (this will be sufficient to complete the proof using the mono- 
tonicity observations above). First, note that this last equality is satisfied only 
if (£, Sp) is in the set 

{(- ,L-n): n = l,...,L-l}. 

Lj — Ti 

Let n* be such that 

n* — \ ~ n* , ^ 

<i< . 22 

L-n* + l ~ L - n* 

To see that such an n* exists, recall that i = k^-p where < p < 1. Also, 
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{k + l)Si = L with Si e N, and k > 1. Consequently, l<i<k<L-l, and 
k G {t;^;^ : n = \^~\ , . . . , L — 1}. Thus, we know that we can find n* as above. 



Furthermore 



' L-n* 



> 1. It follows from (El that 



L-n* <S„< L-n* + 1. 



We now choose 



n 



L — n* 



and Sp = [Sp\ = L — n* 



Then {£ + l)Sp = L, and i > i. So, we conclude that for i as above and 



Sp — [Sp\ 



k + 1 
k^ + 1 



-Si 



we have 



2-p 

£— -1 

^{e+i)Sp < 



V + 1 

Consequently, the condition of Corollary l2.5l is satisfied and we have the desired 
conclusion. □ 



4.2 Proof of Theorem 



We modify the proof of Candes et. al. of the analogous result for the encoder 
Ai (Theorem 2 in [4]) to account for the non-convexity of the P' quasinorm. We 
give the full proof for completeness. We stick to the notation of [4] whenever 
possible. 

Let < p < 1, a; G be arbitrary, and define x* := Ap(6) and h := x* — x. 
Our goal is to obtain an upper bound on \\h\\2 given that ||A/i||2 < 2e (by 
definition of A^). 

Below, for a set T C {1, . . . , iV}, :={!,..., N}\T- for y G M^, yr denotes 
the vector with entries yrii) = y{j) for all j G T, and yriJ) = for j G T'^. 

(I) We start by decomposing /i as a sum of sparse vectors with disjoint sup- 
port. In particular, denote by To the set of indices of the largest (in magnitude) 
S coefficients of x (here S is to be determined later). Next, partition into 
sets Ti, T2, . . . , \Tj\ = L for j > 1 where L G N (also to be determined later), 
such that Ti is the set of indices of the L largest (in magnitude) coefficients of 
Ht^, T2 is the set of indices of the second L largest coefficients of hx^, and so 
on. Finally let Tqi := Tq U Ti. We now obtain a lower bound for ||A/i||2 using 
the RIP constants of the matrix A. In particular, we have 
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\\Ah\\l=\\AhT,, + Y.^hrJ, 

i>2 

>\\AhTAl-T.UhrS 

> (1 - hMT.\r'\\Hn 11^ - (1 + ^lY" E IIH Wl (23) 

Above, together with RIP, we used the fact that || ■ II2 satisfies the triangle 
inequahty for any < p < 1. What now remains is to relate ||/iToi||2 cind 

E,>2||/^T,I|^0 \\hh. 

(II) Next, we aim to bound Z]j>2 II^t^ II2 from above in terms of \\h\\2. To 
that end, we proceed as in [4]. First, note that |/iTj+i(-^)|^ < l^rj(^')l^ fo^ 
£ G T,+i,f G Tj, and thus |/it,+iWI*' < \\hTJl/L. It follows that ||/iT,+i||i < 
lip, and consequently 

E II^T, 11^ < L'^-' E IIHII^ = L^~'\\hTs\\l- (24) 

j>2 i>l 

Next, note that, similar to the case when p = 1 as shown in [4], the "error" h is 
concentrated on the "essential support" of x (in our case Tq). To quantify this 
claim, we repeat the analogous calculation in [4]: Note, first, that by definition 
of X*, 

= 11^ + ^llp = II^To + + \Ws + hrX < Ml- 
As II ■ 11^ satisfies the triangle inequality, we then have 

II^T„||^-||/iToll^+||/iT^|l^-||^T^||^< Ikll^. 

Consequently, 

l|/^T^II^<l|/^T„||^ + 2||a:^c||^, (25) 
which, together with fl24|) . implies 

E\\hTM<L"-\\\hTX + n^Ts\\l)<p'-H\^^^^^^^ (26) 

where p := and we used the fact that ||/iTo||p < l^o|"'^~^||^Toll2 (which 
follows as |supp(/ij'„)| = |To|). Using fl26l) and fl23l) . we obtain 

ll^/^ll^>C^P,L,|T„|||/^ToJII-2/-i|To|i-^(l + 5,.)i||x^e||^, (27) 

where 

C,,L,\To\ ■■= (1 - - (1 + 5l) v'-i (28) 

At this point, using ||A/i||2 < 2e, we obtain an upper bound on ||/iToi||2 given 
by 

WhnX < ^ (i2er + 2p-i(l + S,y^^p0] , (29) 

<-p,i,iroi V 1^0 1 



2 
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provided Cp^L^\Ta\ > (this will impose the condition given in (fT2|) on the RIP 
constants of the underlying matrix A). 

(Ill) To complete the proof, we will show that the error vector h is concen- 
trated on Toi. Denote by hT^[m] the mth largest (in magnitude) coefficient of 
/lyc and observe that l/iT^l'^iJp < ||/iT=||p/''Ti- As hT^_^[m] = hT^lL + m], we then 
have 
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m>L+l m>L+l \ '"' / Lp [2/p-l) 



Here, the last inequality follows because for < p < 1 

E _2 /""^ _2 1 
m p < t pat = — 2 • 

m>L+i L-p~\2/p-l) 

Finally, we use (125|1 and (!30l) to conclude 



< 



< 



To 1 1 p ■ 



lip 



(l + p^-i(2/p - l)-i) \\l + 2p^-§(2/p - 1) 



ft; 



lip 



11-2 

2 



(.31) 



Above, we used the fact that ||/iTollp < l^ol"*^ ^ ||/itoI|2) ^^^d that for any a, 6 > 0, 
and a > 1, a° + 6° < (a + 6)°. 

(IV) We now set |To| = 5, L = kS where k and 5* are chosen such that 
Cp,ks,s > which is equivalent to having k, S, and p satisfy f|T2|) . In this case, 
IIxt^IIp = o's{x)£p, p = 1/k, and combining fl29|) and fl3T]) yields 

11/^11! <Cie^ + C2^|i# (32) 

O 2 

where Ci and C2 are as in (fT^ and (fT5l) . respectively. □ 



^.5 Proof of Lemma Elia 



(I) The following result of Wojtaszczyk [30, Proposition 2.2] will be useful. 

Proposition 4.1 ( [30]) Let he an M ^ N Gaussian random matrix, let 
< jj, < and suppose that KiM (log M)^ < N < e^^' for some ^ > 

(1 — 2/i^)~^ and some constants Ki,K2 > 0. Then, there exists a constant 
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c = c(/i, ^, Ki, K2) > 0, independent of M and N, and a set 
such that 

The above statement is true also for . 

We will also use the following adaptation of [18, Lemma 2] for which we will 
first introduce some notation. Define a body to be a compact set containing 
the origin as an interior point and star shaped with respect to the origin [23]. 
Below, we use conv{K) to denote the convex- hull of a body K. For K ^ B, 
we denote by di{K,B) the "distance" between K and B given by 

di{K, B) := inf{A > : K C B C XK} = inf{A > : \b C K C B}. 

A 

Finally, we call a body K p-convex if for any x,y&K,Xx + fiyEK whenever 
A, /i G [0, 1] such that Xp + /^p = 1. 

Lemma 4.2 Let < p < 1, and let K be a p-convex body in MJ^ . If conv{K) C 
B2, then 

di{K,B^) < C{p)di{conv{K),B^f^'P-^\ 

where 

We defer the proof of this lemma to the Appendix. 

(II) Note that A^{B^) C B^ . This follows because ||Aw||i^2, which is equal 
to the largest column norm of A^^, is 1 by construction. Thus, for x G B^ , 

||^(X)||2 < ||I^||l^2||x||l < 1, 

that is, i<^(5f) C 5f , and so di(I^(£f ), ) is well-defined. Next, by 
Proposition 14. H we know that there exists with -P(^^^) > 1 — e~'^^ such 
that for all u & Q^, 

MBf) 3 ./i^Bf (33) 
From this point on, let G ^2^. Then 
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and consequently 



MAJB-),B^')<iJ.^\ . (34) 



The next step is to note that conv{B^) = and consequently 

conv = {conv{B^)) = 

We can now invoke Lemma 14.21 to conclude that 

d,{A^{B^),B^') < C{p)d,{conv{A4B^)),B^'f-T' 

= C{p)dM^{B^),B^)'^. (35) 

Finally, by using flM|) . we find that 

1/2-l/p 



d,(I.«),Bf ) < dp) ( /^^] , (36) 



and consequently 



M I 



In other words, the matrix A^ has the LQp(a) property with the desired value 
of a for every G fi^ with -P(f^^) > 1 — e"^^-'^. Here c is as specified in 
Proposition I4.1[ 

To see that the same is true for A^, note that there exists a set Qq with 
P{flo) > 1 — e"'^*^ such that for all u G Qq, ||^i('^)||2 < 2, for every column Aj 
of A^ (this follows from RIP). Using this observation one can trace the above 
proof with minor modifications. □ 



44 Proof of Theorem I^Tim 



We start with the following lemma, the proof of which for p < 1 follows with 
very little modification from the analogous proof of Lemma 3.1 in [30] and 
shall be omitted. 

Lemma 4.3 Let < p < 1 and suppose that A satisfies RIP{S, 6) and 
LQp (7p/5i/P^i/2j ^^^^ ^Vp~^/c{p). Then for every x G M^, there 
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exists X G M such that 

Ax = Ax, < ll^3;||2, and ||x||2 < C3||74x||2. 

Here, C3 = ^ + -j^Zs^)^- ^ote that C3 depends only on /i, 5 and p. 

We now proceed to prove Theorem 12. 131 Our proof follows the steps of [30] and 
differs in the handling of the non-convexity of the quasinorms for < p < 1. 



First, recall that A satisfies RIP(S', 6) and LQp(7p/S' ' so by Lemma H73| 
there exists z E such that Az = e, \\z\\p < ^||e||2, and ||z||2 < C3||e||2. 
Now, A{x + z) = Ax + e, and A is (2,p) instance optimal with constant C2,p. 
Thus, 

\\A{A{x) + e)-{x + z)h< C2,,^|^^, 
and consequently 

\\A{A{X) + e) - X\\2 < \\z\\2 + C2,p 



<C'3||e||2 + C, 



asjx + z)i 

2.P 5-1/^-1/2 



<G3||e||2 + 2 C2,p gi/p.i/2 

<G3||e||2 + 2 C2,p ^i/p_i/2 + ^ ^'P^' 

where in the third inequality we used the fact in any that quasinorm satisfies 
the inequality \\a + b\\p < 2p~^(||a||p + \\b\\p) for all a, 6 G M^. So, we conclude 

II A(A(x) + e) - x||2 < {Cs + 2'/^~'C2,php) + ^'^'~'C2,p^^. (38) 
That is (i) holds with C = C3 + 2^/p-^C2,p{l/% + 1). 

Next, we prove parts (ii) and (iii) of Theorem 12.131 As in the analogous proof 
of [30], Theorem 12.131 (ii) can be seen as a special case of Theorem 12.131 (iii), 
with 6 = 0. We therefore turn to proving (iii). Once again, by Lemma |4.3[ 
there exists v and z in such that the following hold. 



Av = e; IK'llp < ^^^^^^||e||2, ||f II2 < C'sllelh, and 



Az = AxTf, \\z\\p < ^^^^^ — IIAxTgHh, Iklb < C'sll^a^Tg-lh- 

Here Tq is the set of indices of the largest (in magnitude) S coefficients of x, 
and Tq and xt^ are as in the proof of Theorem 12.11 
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Similar to the previous part we can see that A{xto + ^ + v) = Ax + e and by 
the hypothesis of {2,p) instance optimahty of A, we have 

\\A{Ax + e) - (xto +z + v)h< C2,p^^^Yj—^jr^. 

Consequently observing that xtq = x — xt§ and using the triangle inequality, 

UAfAf \ , \ II ^ II II I asixTo + z + v)iP 
\\A{A{x) + e) - x\\2 < \\xt§ -z- v\\2 + C^,^ s^l^-^l^ 

II I oi/p-i//^ \ I p"!" P p 

< XTi^ - Z- V\.2 



< asix),. + \\zh + \\vh + 2^/-^C,, f + Ml 

< cxsix),, + (c, + 2V?'-i^^ (llelh + WAxTgh). (39) 



That is (iii) holds with C = 1 + C3 + 21/p-i^. By setting e = 0, one can see 
that this is the same constant associated with (ii). This concludes the proof 
of this theorem. □ 



4-5 Proof of Theorem 12.141 



First, we show that (A^^, Ap) is (2,p) instance optimal of order S for an ap- 
propriate range of S with high probability. One of the fundamental results in 
compressed sensing theory states that for any 6 G (0, 1), there exists ci, C2 > 
and flmp with P(r2Rip) > 1 — 2e~'^^^-' , all depending only on 6, such that A^;, 
uj E ^Rip, satisfies RIP(£, 5) for any i < ci^^j^^^^jgy. See, e.g., [6], [1], for the 
proof of this statement as well as for the explicit values of the constants. Now, 
choose 6 G (0, 1) such that 6 < fj/l-l^l - Then, with ci, c^, and I^rip as above, 

for every to G ^rip and for every S < y iog{N/M) ' constants of A^ 

satisfy ^ (and hence ([12])), with k = 2. Thus, by Corollary [23] (A^, Ap) is 
instance optimal of order S with constant C^^ as in ([T^ . 

Now, set 5*1 = Ci i^g^if^,[-^ with ci < ci/3 such that Si G N (note that such a 
Ci exists if M and iV are sufficiently large). By the hypothesis of the theorem, 
M and N satisfy the hypothesis of the Lemma [2.121 with = 2, Ki = 1, some 
< /X < 1/2, and an appropriate K2 (determined by Ci above). Because 



by Lemma [2.12] there exists fi^j, -P(^^^j) > 1 — e '^^-'^ such that for every u G fi^. 
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satisfies LQ^ (^ ^ 1/^-1/2 ] where 7p(/i) := — — . Consequently, set 

ill := ^Rip n fi/,. Then, P(fii) > 1 - 2e-^2A/ _ ^-cM > ^ _ 3e-c2A/^ 
C2 = min{c2, c}. Note that C2 depends on c, which is now a universal constant, 
and C2, which depends only on the distribution of A^ (and in particular its 
concentration of measure properties, see [1]). Now, if u E fli, A^ satisfies 
RIP(35'i, 5), thus RIP(S'i, (5), as well as LQ^ (^^py^rrT^-) • Therefore we can apply 
part (i) of Theorem 12. 131 to get the first part of this theorem, i.e., 

II A(A.(x) + e) - x||2 < C (^\\eh + • (40) 

Here C is as in ( l38l) with C2,p = C^^^. To finish the proof of part (i), note that 

for S < Si, as,{x)eP < as{x)iv and S^/p-^^^ < Sl^^'^^^ 

To prove part (ii), first define Tq as the support of the 5*1 largest coefficients 
(in magnitude) of x and Tq = {1, A^} \ Tq. Now, note that for any x there 
exists a set with P{Qx) > 1 — e"'^*'^ for some universal constant c > 0, 
such that for all u & fix, ||^c^2;Tq':||2 < 2||a;T|j||2 = '^crsi{x)e2 (this follows from 
the concentration of measure property of Gaussian matrices, see, e.g., [1]). 
Define := fi^ H fii. Thus, P(fi^.) > 1 - Se-'^^^^ - e"^^^ > 1 - 4e-'=3*^ where 
C3 = min{c2,c}. Note that the dependencies of C3 are identical to those of 
C2 discussed above. Recall that for u G fli, A^ satisfies both RIP(5'i,5) and 

LQp (^ (g • ^^'^ apply part (iii) of Theorem 12.131 to obtain for 

||A(A^(x) + e) - x||2 < C {?>(ysAx)p + llelh) • (41) 
Above, the constant C is as in fl39l) . Once again, note that for S < Si, 
o'sAx)e^ < o's{.x)(? to finish the proof for any S < Si. □ 



5 Appendix: Proof of Lemma 14.21 



In this section we provide the proof of Lemma S]2] for the sake of completeness 
and also because we explicitly calculate the optimal constants involved. Let 
us first introduce some notation used in [18] and [23]. 

For a body K dMP, define its gauge functional by ||x||x := inf{t > : x G 
tK}, and let Tg{K), q G (1,2], be the smallest constant C such that 

VmGN, xi,...,x„^eK mi Iwf^eiXiWK] < Cm^/'^. 
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Given a p-convex body K and a positive integer r, define 

ar = ar{K) := sup{-^^-^^^^— : Xi E K,i < r}. 

r 

Note that < r~^+^/P. 

Finally, conforming with the notation used in [18] and [23], we define 5k '■ = 
di{K, conv{K)). Note that this should not cause confusion as we do not refer 
to the RIP constants throughout the rest of the paper. It can be shown by a 
result of [24] that 6k = sup^ ar{K), cf. [18, Lemma 1] for a proof. 

We will need the following propositions. 

Proposition 5.1 (sub-additivity of || ■ H^-) For the gauge functional || ■ \\k 
associated with a p-convex body K G M", the following inequality holds for any 
x,y E M". 

\\x+yrK<\\xrK+\\yrK- m 



PROOF. Let r = \\x\\k and u = ||y||A'- If at least one of r and u is zero, 
then (142|) holds trivially. (Note that, as ii" is a body, \\x\\k = if and only if 
X = 0.) So, we may assume that both r and u are strictly positive. Since K is 
compact, it follows that x/r E K and y/u E K. Furthermore, K is p-convex, 
i.e., for all G [0,1] with a + /5 = 1, we have a^/^x/r + l3^/'Py/u E K. 
In particular, choose a = and (3 = -f^. This gives ^^^^1^^^,^ G K. 

Consequently, by the definition of the gauge functional || ^^p^p^/p \\k < 1- 

_ ^ ^ 1 ^^A 11^ I „,IIP ^ I _ 

y-i II (j.P+uP)i/p WK (rP+uP) 



Finally, llTiialfer^ = 7^ < 1 ^nd Hx+yf^ < r^+u^ = llxf^ + ibr^. □ 



Proposition 5.2 T2{B^) = 1. 



PROOF. Note that || ■ = || ■ [ja, and thus, by definition, T2(5^) is the 
smallest constant C such that for every positive integer m and for every choice 
of points Xi, ...yXm E B2, 

( m '\ 

(43) 




For m < n, we can choose {xi, . . . , Xm} to be orthonormal. Consequently, 



1=1 i=l 

and thus, T2 = T2(52 ) > 1- On the other hand, let m be an arbitrary positive 
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integer, and suppose that {xi, . . . ,Xm} C Then, it is easy to show that 
there exists a choice of signs ej, i = 1, . . . , m such that 




Indeed, we will show this by induction. First, note that ||eiXi||2 = ||a^i||2 < V^- 
Next, assume that there exists ei, . . . , ek-i such that 

k-l 

II eiXi\\2 < Vk - 1. 

i=l 

Then (using parallelogram law), 

k-l k-l k-l 

min{|| eiXi + Xk\\l, \\ J2 ^i^i ~ < \\ Y ^i^iWl + Il^fcll2 < k. 

i=l i=l i=l 

Choosing accordingly, we get 

k 

II Ci^i||2 — k, 
i=l 

which implies that T2 < 1. Using the fact that T2 > 1 which we showed above, 
we conclude that T2 = 1. □ 

Proof of Lemma 14.21 

We now present a proof of the more general form of Lemma as stated in |18] 
and [23] (albeit for the Banach-Mazur distance in place of di). The proof is 
essentially as in [18], cf. [23], which in fact also works with the distance di to 
establish an upper bound on the Banach-Mazur distance between a p-convex 
body and a symmetric body. 

Lemma 5.3 Let < p < 1, g G (1, 2], and let K he a p-convex body. Suppose 
that B is a symmetric body with respect to the origin such that conv{K) C B. 
Then 

di{K,B) < CpMB)f~'[di{conv{K),B)f, 
where (p = ■ 

PROOF. Note that K C conv{K) C 5, and therefore di{K,B) is well- 
defined. Let d = di{K,B) and T = T^iB). Thus, (l/rf)S C K C B. Let 
m be a positive integer and let Xi,i G 1,2, ...,2™^ be a collection of points 
in K. Then, Xi & B and by the definition of T, there is a choice of signs 
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so that II I]?=i ^jXillB < T2"^^'^. Since B is symmetric, we can assume that 
D = {i : ej = 1} has |-D| > 2™"^^. Now we can write 



II E ^^Wk = II E e^^^ + 2 E ^^r^ < rf^l E e^^^B + 211 E ^^\\k 

i=l 1=1 i(^D i=l i^D 

< (FTn'^P''' + , (44) 

where the first inequahty uses the sub-additivity of || • ||/< and the fact that 
{l/d)B C K. Thus by taking the supremum in (144|) over all possible Xj's and 
dividing by 2"^^, we obtain, for any m, 

al^ < rfPT^'2'"^'/''-™^' + 

By applying this inequality for m — l,m — 2, k, we obtain the following 
inequality for any k < m 

oo r)-fcp(l-l/(j) 

afm < dPTP y 2-*P(i-^/«) + a^, < d^TP- — + 2^(i-p). (45) 

p(l-l/g)log2 

Since 6k = sup^ a^, we now want to minimize the right hand side in (l45l) by 
choosing k appropriately. To that end, define 

f(k) := 2^-(^-P) + (ciT)P- — 

^ >(l-l/g)log2 

and 



p(l-l/g)log2" 

Since af™ ^ /(^) ^^Y ^ ^ {1; "'t^— 1}, the best bound on afm is essentially 
given by f{k*), where f'{k*) = 0. However, since k* is not necessarily an 
integer (which we require), we will instead use f{k* + 1) > f{\k*~\) > f{k*) as 
a bound. Thus, we solve f\k*) = to obtain k* = log2 (^^t=^)- 

evaluating f(k) at /c* + 1, we obtain < (/(^* + 1))^''^ for every m > k* + 1. 
In other words, for every m > fc* + 1 , we have 

1 / i/p-i 
i-v / 1 — n \ / 1 \ 

a.. < (dT)-^ (2'- + 2-<'-V,._P_) (^^^rttois) • 

On the other hand, if m < fc*, then a^m < 2'^(^-p) < 2^^*+^'^^^-p\ However, 
this last bound is one of the summands in the right hand side of (145|) with 
k = k* + 1 (which we provide a bound for in (146|) ). Consequently (H6!l holds for 
all m. In particular, it holds for the value of m which achieves the supremum 



28 



of a^-m. Since = sup^a,., we obtain 



I 1 \ p{i-p/g) 



p{l-l/q)) \{l-p)\og2) 

(47) 



Remark 5.4 In the previous step we utilize the fact that in the derivations 
above we can replace every 2™ and 2^ with m and k respectively, thus every 
m and k with log2m and log2k without changing ( l46l) . This allows us to pass 
from the bound Oft Ol^m to 5k = supr-ctr without any problems. 

Recalling the definitions of di{conv{K), B) and 6k, note the following inclu- 
sions: 

1 1 

--— C —conv(K) C K C conv(K) C B. (48) 

ox«i [conv {K,B)) Ok 

Consequently SKdi{conv(K b)) ^ C K C B and the inequality 

di{K, B) = d< 6Kdi{conv{K), B) (49) 

follows from the definition of di{K, B). Combining fH9l) and fHTl) we complete 
the proof with 



I — P \ p2{l-l/q) / 1 \ P{l-l/'3) 



p{l-l/q)J \{l-p)\og2^ 



□ 



Finally, we choose above B = B2 and q = 2, recall that T = T2(i?2 ) = 1 (see 
Proposition 15.21) . and obtain Lemma [4.21 as a corollary with 

2-p 2-2p 

/ (1 - p)2^~P/^\^ ( 1 \^ 

CP). (2-+^^) [j^-^;^ . (50) 
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